Mining chemical information from open patents

نویسندگان

  • David M. Jessop
  • Samuel E. Adams
  • Peter Murray-Rust
چکیده

Linked Open Data presents an opportunity to vastly improve the quality of science in all fields by increasing the availability and usability of the data upon which it is based. In the chemical field, there is a huge amount of information available in the published literature, the vast majority of which is not available in machine-understandable formats. PatentEye, a prototype system for the extraction and semantification of chemical reactions from the patent literature has been implemented and is discussed. A total of 4444 reactions were extracted from 667 patent documents that comprised 10 weeks' worth of publications from the European Patent Office (EPO), with a precision of 78% and recall of 64% with regards to determining the identity and amount of reactants employed and an accuracy of 92% with regards to product identification. NMR spectra reported as product characterisation data are additionally captured.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Patents with tmChem, GNormPlus and an Ensemble of Open Systems

The significant amount of medicinal chemistry information contained in patents make them an attractive target for text mining. The CHEMDNER task at BioCreative V focused on information extraction from patents. This manuscript describes our submissions to the CEMP (chemical named entity recognition) and GPRO (gene and related object identification) subtasks. Our CEMP submission is an ensemble of...

متن کامل

Comparing manual and automated extraction of chemical entities from documents

The chemical information landscape is changing rapidly with a yearly increase of over 1 million new compounds and more than 700,000 publications related to chemistry [1]. Exploring the chemical space covered by relevant journals and patents is a crucial step in early stage medicinal chemistry projects. Extracting chemical entities from unstructured text is a complex task and different approache...

متن کامل

Expanding opportunities for mining bioactive chemistry from patents

Bioactive structures published in medicinal chemistry patents typically exceed those in papers by at least twofold and may precede them by several years. The Big-Bang of open automated extraction since 2012 has contributed to over 15 million patent-derived compounds in PubChem. While mapping between chemical structures, assay results and protein targets from patent documents is challenging, the...

متن کامل

Information Extraction from Chemical patents

The development of new chemicals or pharmaceuticals is preceded by an indepth analysis of published patents in this field. This information retrieval is a costly and time inefficient step when done by a human reader, yet it is mandatory for potential success of an investment. The goal of the research project UIMA-HPC is to automate and hence speed-up the process of knowledge mining about patent...

متن کامل

Overview of the CHEMDNER patents task

A considerable effort has been made to extract biological and chemical entities, as well as their relationships, from the scientific literature, either manually through traditional literature curation or by using information extraction and text mining technologies. Medicinal chemistry patents contain a wealth of information, for instance to uncover potential biomarkers that might play a role in...

متن کامل

Information Retrieval and Text Mining Technologies for Chemistry.

Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical document...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2011